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We have presented a new axiomatic derivation of Shannon Entropy for a discrete probability 
distribution on the basis of the postulates of additivity and concavity of the entropy function. We 
have then modified shannon entropy to take account of observational uncertainty. The modified en- 
tropy reduces, in the limiting case, to the form of Shannon differential entropy.As an application we 
have derived the expression for classical entropy of statistical mechanics from the quantized form 
of the entropy. 
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1. Introduction 



Shannon entropy is the key concept of information theory [1]. It has found wide applications 
in different fields of science and technology [2-5]. It is a characteristic of probability distribution 
providing a measure of uncertainty associated with the probability distribution. There are different 
approaches to the derivation of Shannon entropy based on different postulates or axioms [6, 7]. 

The object of present paper is to stress the importance of the properties of additivity and 
concavity in the determination of functional form of Shannon entropy and it's generalization. The 
main content of the paper is divided into three sections. In section 2 we have provided an axiomatic 
derivation of Shannon entropy on the basis of the properties of additivity and concavity of entropy- 
function. In section 3 we have generalized Shannon entropy and introduced the notion of total 
entropy to take account of observational uncertainty. The entropy of continuous distribution, called 
the differential entropy has been obtained as a limiting value . In section 4 the differential entropy 
along with the quantum uncertainty relation has been used to derive the expression of classical 
entropy in statistical mechanics. 

2. Shannon Entropy : Axiomatic Characterization 

Let A n be the set of all finite discrete probability distribution 

n 

P = {(Pl,P2, -;Pn),Pi > ®,J2Pi = !} 

i=l 

In other words, P may be considered as a random experiment having n possible outcomes with 
probabilities (pi,p%, ....,p n ). There is uncertainty associated with the probability distribution P 
and there are different measures of uncertainty depending on different postulates or conditions. In 
general, the uncertainty associated with the random experiment P is a mapping [8] 



H(P):A n ^R (2.1) 

where R is the set of real numbers. It can be shown that (2.1) is a reasonable measure of uncertainty 
if and only if it is a Shur concave on A n [8]. A general class of uncertainty measures is given by 

n 

#(p) = £0fe) (2-2) 
i=i 

where <fi : [0, 1] — > R is a concave function. By taking different concave function defined on [0,1], 
we get different measures of uncertainty or entropy. For example, if we take (f>{pi) = —pi logpj, we 



H{P) = H( Pl ,p 2 ,...., Pn ) = -k^PdogPi (2.3) 

i=i 

where log = by convention and A; is a constant depending on the unit of measurement of 
entropy. There are different axiomatic characterizations of Shannon entropy based on different set 
of axioms [6,7]. In the following we shall present a different approach depending on the concavity 
character of entropy-function. We set the following axiom to be satisfied by the entropy function 
H(P) = H( Pl ,p 2 ,....,p n ). 

Axiom (1) : We assume that the entropy H(P) is non-negative, that is, for all P = (pi,p 2 , ■■■■,Pn), 
H(P) > 0. This is essential for a measure. 

Axiom (2): We assume that generalized form of entropy-function (2.2): 



tf(P) = E<Kft) (2.4) 
i=i 

Axiom (3) : We assume that the function is a continuous concave function of its arguments. 

Axiom (4) : We assume the additivity of entropy, that is, for any two statistically independent 
experiment P = (p u p 2 , ....p n ) and Q = (q lt q 2 , q m ) 

H{PQ) = E E <t>(Wa) = E <f>(Pi) + E #Z«) (2-5) 

j a j a 

Then we have the following theorem. 

THEOREM (2.1) : If the entropy-function H(P) satisfies the above axioms (1) to (4), then 
H(P) is given by 

n 

H(P) = -k^pilogpi (2.6) 

i=l 

where k is a positive constant depending on the unit of measurement of entropy. 



PROOF : For two statistically independent experiments the joint probability distribution pj a 



Pja Pj-Qa 



(2.7) 



Then according to the axiom of additivity of entropy (2.5), we have 



E E </>(Pi-Qa) = E 4>iPi) + E * (?«) (2-8) 

j a j a 

Let us now make small changes of the probabilities pk and pj of the probability distribution P = 
(pi,P2, ■■■■iPji --Pk, ■■■■>Pn) leaving others undisturbed and keeping the normalization condition fixed. 
By the axiom of continuity of 4> the relation (2.8) can be reduced to the form 



E QaW(Pj.q a ) ~ <P'(Pk-q a ] = WiPj) - <t>'(Pk)} (2.9) 

a 

The r.h.s of (2.9) is independent of g Q and the relation (2.9) is satisfied independently of p's if 



(f>'(q a . Pj ) - <j>'(q a Pk) = 0'fe) - (f>'(pk) (2.10) 
The above leads to the Cauchy's functional equation 



<P'(qa-Pj) = <f>'(q a ) + (f>'(pj) (2.11) 
The solution of the functional equation (2.11) is given by 



ct>\p 3 )=A\ogp J + B (2.12) 



or 



<f>{pj) = Ap 3 log Pj + (B - A) Pj + C (2.13) 

where A, BandC&re all constants. The condition of concavity (axiom(3)) requires A < and let us 
take A = —k where k(> 0) is positive constant by axiom (1). The generalized entropy (2.4) then 



or 



H(P) = -k^Pj logft- + (B — A) + C 

j 



(2.14) 



H{P) = -k^PjlogPj (2.15) 

3 

where constants (B-A) and C have been omitted without changing the character of the entropy- 
function. This proves the theorem. 

3. Total Shannon Entropy and Entropy of Continuous Distribution 

The definition (2.3) of entropy can be generalized straightforwardly to define the entropy of a 
discrete random variable. 

DEFINITION : Let X e TZ denotes a discrete random variable which takes on the values 
xi,X2, ....,x n with probabilities pi,P2, ■■■■,Pn respectively, the entropy H(X) of X is then defined by 
the expression [3] 



H(X) = -k^pdogpi (3.1) 

Let us now generalize the above definition to take account for an additional uncertainty due to 
the observer himself, irrespective of the definition of random experiment. Let X denotes a discrete 
random variable which takes the values xi, X2, x n with probabilities Pi,P2, ■■■■,Pn- We decompose 
the practical observation of X into two stages. First, we assume that X e L(xi) with probability 
Pi, where L(xi) denotes the ith interval of the set {L(xi), L(x 2 ), ....,L(x n )} of intervals indexed by 
Xi. The Shannon entropy of this experiment is H(X). Second, given that X is known to be in 
the ith interval, we determine its exact position in L{xj) and we assume that the entropy of this 
experiment is U(xj). Then The global entropy associated with the random variable X is given by 

n 

H T (X)=H(X) + Y,PiU(xi) (3.2) 
i=i 

Let hi denotes the length of the ith interval L(xi), (i = 1, 2, n), and define 



U(xi) = kloghi 



(3.3) 



We have then 



n 



n 



Pi 



H T (X)^H(X) + k^p i logh i 



kY, Pi lo § 



(3.4) 



i=i 



i=\ 



The expression H T (X) given by (3.4) will be referred to as the total entropy of the random vari- 
able X. The above derivation is physical. In fact, what we have used is merely a randomization 
of the individual event X = Xj, (i = 1,2, ....,ri) to account for the additional uncertainty due to 
the observer himself, irrespective of the definition of random experiment [3]. We shall, derive the 
expression (3.4) axiomatically as generalization of the theorem (2.1). 

THEOREM (3.1) : Let the generalized entropy (2.2) satisfies, in addition to the axioms (1) 
to (4) of theorem (2.1) the boundary conditions : 



to take account of the post-observational uncertainty where hi is the length of the ith class L(xi) ( 
or width of the observational value Xi). Then the entropy- function reduces to the form of the total 
entropy (3.4). 

PROOF : The procedure is the same as that of theorem (2.1) upto the relation (2.12) : 



</>i(l) = kloghi, (i = l,2,....,n) 



(3.5) 



<j>'( Pj )=Alogp j + B 



(3.6) 



Integrating (3.6) with respect to p^and using the boundary condition (3.5), we have 



(f)(pj) - k log hj = Apj log pj + (B - A)pj - B 



(3.7) 



so that the generalized entropy (2.2) reduces to the form 



p ■ 



where we have taken A = —k < for the same unit of measurement of entropy and the negative 
sign to take account the axiom (1). The constants appearing in (3.8) have been neglected without 
any loss of characteristic properties. The expression (3.8) is the required expression of total entropy 
obtained earlier. 

Let us now see how to obtain the entropy of a continuous probability distribution as a limiting value 
of the total entropy H T (X) defined above. For this let us first define the differential entropy H(X) 
of a continuous random variable X. 

DEFINITION : The differential entropy Hc{X) of a continuous random variable with prob- 
ability density f(x) is defined by [9] 



where R is the support set of the random variable X. We divide the range of X into bins of length 
( or width ) h. Let us assume that the density f(x) is continuous within the bins. Then by mean 
value theorem, there exists a value Xi within each bin such that 




(3.9) 




(3.10) 



We define the quantized or discrete probability distribution (pi,P2, ,Pn) by 




(3.11) 



so that we have then 



Pi = hf(xi) 



(3.12) 



The total entropy H T (X) defined for hi = h(i = 1, 2, n) 



n 



Pi 
h 



H T (X) = -fc$>log 



(3.13) 



then reduces to the form 



n 



H T {X) = -kY,hf{x i )]ogf{x i ) 



(3.14) 



i=i 



Let h — > 0, then by definition of Riemann integral we have Ht(X) — > H{X) as /i — > 0, that is, 



Thus we have the following theorem : 

THEOREM (3.2) : The total entropy H T (X) defined by (3.13) approaches to the differential 
entropy Hc(X) in the limiting case when the length of each bin tends to zero. 

4. ApplicationrDifferential Entropy and Entropy in Classical Statistics 

The above analysis leads to an important relation connecting quantized entropy and differential 
entropy. From (3.13) and (3.15) we see that 



showing that when h — > that is, when the length of the bins h is very small the quantized entropy 
given by the l.h.s of (4.1) approaches not to the differential entropy Hc(X) defined in (3.9) but to 
the form given by the r.h.s of (4.1) which we call modified differential entropy. This relation has 
important physical significance in statistical mechanics. As an application of this relation we now 
find the expression of classical entropy as a limiting case of quantized entropy. 

Let us consider an isolated system with configuration space volume V and a fixed number of 
particles N, which is constrained to the energy-shell R = (E, E + AE). We consider the energy shell 
rather than just the energy surface because the Heisenburg uncertainty principle tells us that we can 
never determine the energy E exactly, we can make AE as small as we like. Let f(X N ) be the prob- 
ability density of microstates defined on the phase space T = {X N = (q±, q 2 , q2N',Pi,P2, ■■■■,P2n) 
. The normalized condition is 




(3.15) 




(4.1) 



I f( X N )X N = 1 

J Ft 



(4.2) 



R = {X N : E < H(X N ) <E + AE} 
Following (4.1) we define the entropy of the system as 



(4.3) 



S = -k J f(X N ) \n{C N f(X N )}dX N (4.4) 

The constant C N appearing in (4.4) is to be determined later on. The probability density for 
statistical equilibrium determined by maximizing the entropy(4.4) subject to the condition (4.2) 
leads to 



f{X N ) = Q{E ^ N) ^ E < H(X N ) <E + AE (4.5) 

= otherwise 
where H(X N ) is the Hamiltonian of the system, Q(E, V, N) is the volume of the energy shell 
(E, E + AE) [10]. Putting (4.5) in (4.4) we obtain the entropy of the system as [10] 



* = * b {«^} (4.6, 

The constant C N , has the same unit as Q(E, V, N) and cannot be determined classically. However 
it can be determined from quantum mechanics. Then we have C N = (h) 3N for distinguishable 
particles and C N = N\(h) 3N for indistinguishable particles. From Heisenberg uncertainty principle, 
we know that if h is the volume of a single state in phase space then fl(E, V, N)/(h) 3N is the total 
number of microstates in the energy shell (E, E + AE). The expression (4.6) then becomes identical 
to the Boltzmann entropy. With this interpretation of the constant C N , the correct expression of 
classical entropy is given by [10, 11] 

S=-k[ f(X N )\n{(h) 3N f(X N )}dX N (4.7) 
Jr 

The classical entropy that follows a limiting case of Von Neumann entropy is given by [12] 



R J= -k ( li^l}ntf(X N )\dX N 



This is, however different from the one given by (4.7) and it does not lead to the form of Boltzmann 
entropy (4.6). 

6. Conclusion 

The literature on the axiomatic derivation of Shannon entropy is vast [6, 7]. The present 
approach is, however, different. This is based mainly on the postulates of additivity and concavity 
of entropy function. There are, infact, variant forms of additivity and non decreasing characters of 
entropy in thermodynamics. The concept of additivity is dormant in many axiomatic derivations 
of Shannon entropy. It plays a vital role in the foundation of Shannon information theory [13]. 
Non-additive entropies like Renyi's entropy and Tsallis entropy need a different formulation and 
leads to different physical phenomena [14,15]. In the present paper we have also provided a new 
axiomatic derivation of Shannon total entropy which in the limiting case reduces to the expression 
of modified differential entropy (4.1). The modified differential entropy together with quantum 
uncertainty relation provides a mathematically strong approach to the derivation of the expression 
of classical entropy. 
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